Dataset

Column

'data.frame':   100 obs. of  4 variables:
 $ voivodeship : chr  "mazowieckie" "mazowieckie" "mazowieckie" "mazowieckie" ...
 $ municipality: chr  "urban" "rural" "urban-rural" "urban-rural" ...
 $ income      : num  2761 2866 3312 3033 2356 ...
 $ expenses..  : num  1029 1337 1249 1145 888 ...

Number of rows

100

Row

Average of the income

2944

Average of the expenses

1104

Row

Average of the income in a gauge format

Average of the expenses in a gauge format

Map of Poland

Column

Introduction

Education is one of the most important tasks of the state, to which considerable public funds are allocated every year. It is an area where it is certainly not worth saving, because the level of prosperity in society depends on the way and quality of education. The subject of this study is the per capita expenditure of the budgets of Polish municipalities in 2016 on education and upbringing. In addition, we use information about the income of the budgets of these municipalities, including per capita. The choice of municipalities has been limited to the Małopolskie and Mazowieckie voivodeships only.

Dataset

The data comes from the website of the Central Statistical Office (GUS) (www.stat.gov.pl), from the Local Data Bank section, respectively: • category: Public finances, group: Revenue from the budgets of municipalities and cities with district rights, subgroup: Revenue per capita • category: Public finances, group: Expenditure of the budgets of municipalities and cities with district rights, subgroup: Expenditure per capita

Research Question

• what was the income of municipal budgets in 2016 and the amount of their expenses on education? • did the distribution of expenses vary depending on the province or type of municipality under consideration? • did average expenses differ significantly between urban, urban-rural and rural municipalities? • was there a significant relationship between income and spending?

About Me

Ireneusz Korpusik

I’m a part-time MSc student in IT Carlow. The program code is: CW_SRDAT_M Y5 and a teacher name is: Agnes Maciocha. Project was done in RStudio program with Flexdashboard package.

Page 1

Column

Frequency table

Voivodeship n %
malopolskie 46 46
mazowieckie 54 54
sum 100 100

HighCharter Treemap of Mazowieckie and Malopolskie

Column

Qualitative characteristic

Type of municipality

Type of municipality n %
rural 32 32
urban 25 25
urban-rural 43 43
sum 100 100

Page 2

Column

Bar chart of distinction

Column {data-width=450} ———————————————————————–

Budget revenue

mean sd median min max range skew kurtosis Q0.25 Q0.75
2943.92 423.8 2871.9 2328.38 5338.39 3010.01 2.21 8.89 2719.58 3043.01

Column

Chart C

Page 3

Column

Mean of income


    One Sample t-test

data:  dane$income
t = -1.3233, df = 99, p-value = 0.1888
alternative hypothesis: true mean is not equal to 3000
95 percent confidence interval:
 2859.826 3028.008
sample estimates:
mean of x 
 2943.917 

Column

Budget expenses for education

mean sd median min max range skew kurtosis Q0.25 Q0.75
1103.65 185.27 1097.38 667.44 1625.47 958.03 0.36 0.64 998.62 1192.32

Chart c

Page 4

Column

Chart A


    One Sample t-test

data:  dane$expenses..
t = 5.5948, df = 99, p-value = 1.968e-07
alternative hypothesis: true mean is not equal to 1000
95 percent confidence interval:
 1066.891 1140.413
sample estimates:
mean of x 
 1103.652 

Comparison of distributions

voivodeship mean sd median min max skew kurtosis Q1 Q3
malopolskie 1107.52 187.89 1101.82 667.44 1589.53 0.17 0.21 1016.62 1204.92
mazowieckie 1100.35 184.70 1095.20 673.80 1625.47 0.53 0.93 995.11 1182.82

Column

Chart C

Page 5

Column

Divison by type of municipality

municipality mean sd median min max skew kurtosis Q1 Q3
rural 1174.26 159.78 1174.93 876.58 1589.53 0.45 0.12 1077.93 1266.30
urban 1064.10 228.08 1032.37 673.80 1625.47 0.56 -0.08 955.30 1192.71
urban-rural 1074.10 163.44 1086.92 667.44 1589.52 0.42 1.68 989.69 1134.47

Column

Chart B

Page 6

Column

Analysis of variance
Levene's Test for Homogeneity of Variance (center = median)
      Df F value Pr(>F)
group  2  1.6724 0.1932
      97               
Chart B
             Df  Sum Sq Mean Sq F value Pr(>F)  
municipality  2  236205  118103   3.623 0.0304 *
Residuals    97 3161835   32596                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Chart C

    Pairwise comparisons using t tests with pooled SD 

data:  dane$expenses.. and dane$municipality 

            rural urban
urban       0.024 -    
urban-rural 0.019 0.826

P value adjustment method: none 

Column

Chart D

Page 7

Column

Chart A

Column

Chart B

Chart C

[1] 0.4458642

Call:
lm(formula = expenses.. ~ income, data = dane3)

Residuals:
    Min      1Q  Median      3Q     Max 
-468.42  -95.66   15.91   80.75  496.95 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 411.6553   141.7194   2.905  0.00455 ** 
income        0.2364     0.0482   4.906 3.75e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 166.9 on 97 degrees of freedom
Multiple R-squared:  0.1988,    Adjusted R-squared:  0.1905 
F-statistic: 24.07 on 1 and 97 DF,  p-value: 3.746e-06
---
title: "Spending of the budgets of Poland's municipalities on education and upbringing in 2016"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: scroll
    theme: yeti
    navbar:
      - {
       title: "Dashboard", href: "https://github.com/Irek-Korpusik"
       }
       
    source_code: embed
    
#runtime: shiny
---

```{r setup, include=FALSE}
library(flexdashboard)
library(knitr)
library(ggplot2)
library(scales)
library(psych)
library(dplyr)
library(car)
library(readr)
library(lubridate)
library(tidyverse)
library(plotly)
library(DT)
library(highcharter)
library(tidyr)
library(treemap)

# options(shiny.sanitize.errors = FALSE)
```


Dataset {data-navmenu="Home"}
=====================================

 
Column {data-width=650}
-----------------------------------------------------------------------




```{r}
# reading the data
dane <- read.csv2("dane.csv")

 
DT::datatable(dane, rownames = FALSE,
              class = ('cell-border stripe'),
              filter = 'top',
  extensions = 'Buttons', options = 
    list(pageLength = 8, lengthMenu = c(5, 10, 15, 20),
    dom = 'Bfrtip', autoWidth = FALSE,
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print')))
```




```{r}
str(dane)
```



### Number of rows

```{r}
valueBox(length(dane$voivodeship),
         icon = "fa-bars",
         color = "orange")
```

Row {data-width=400}
-----------------------------------------------------------------------

### Average of the income

```{r}
avg_income <- round(mean(dane$income),0)
valueBox(avg_income,
         icon = "glyphicon-circle-arrow-up",
         color = "yellow")
```


### Average of the expenses

```{r}
avg_expenses <- round(mean(dane$expenses..),0)

valueBox(avg_expenses,
         icon = "glyphicon-circle-arrow-down",
         color = "olive")
```

Row {data-width=400}
-----------------------------------------------------------------------

### Average of the income in a gauge format

```{r}
gauge(round(mean(dane$income, na.rm = TRUE),0),
      min = min(dane$income),
      max = max(dane$income),
      symbol = "%",
      gaugeSectors(
          success = c(80,100),
          warning = c(40,79),
          danger = c(0,39)
      ))
```

### Average of the expenses in a gauge format

```{r}
gauge(round(mean(dane$expenses.., na.rm = TRUE),0),
      min = min(dane$expenses..),
      max = max(dane$expenses..),
      symbol = "%",
      gaugeSectors(
          success = c(80,100),
          warning = c(40,79),
          danger = c(0,39)
      ))
```


Map of Poland {data-navmenu="Home"}
=====================================

    
```{r}
map_pl <- get_data_from_map(download_map_data("https://code.highcharts.com/mapdata/countries/pl/pl-all.js"))
```
```{r}
state <- map_pl %>%
  select(`hc-a2`) %>%
  arrange(`hc-a2`)
```

```{r}
hcmap("https://code.highcharts.com/mapdata/countries/pl/pl-all.js",
      joinBy = c("hc-a2","State"),
      name = "Map of Poland",
      dataLabels= list(enabled = TRUE, format= '{point.name}'),
      borderColor = "#000000", borderWidth  = 0.1,
      color = "green",
      tooltip = list(valueDecimals = 0))
```




Column {.tabset}
-----------------------------------------------------------------------

Introduction {data-navmenu="Home"}
=====================================
Education is one of the most important tasks of the state, to which considerable public funds are allocated every year. It is an area where it is certainly not worth saving, because the level of prosperity in society depends on the way and quality of education.
The subject of this study is the per capita expenditure of the budgets of Polish municipalities in 2016 on education and upbringing. In addition, we use information about the income of the budgets of these municipalities, including per capita. The choice of municipalities has been limited to the Małopolskie and Mazowieckie voivodeships only.

Dataset {data-navmenu="Home"}
=====================================

The data comes from the website of the Central Statistical Office (GUS) (www.stat.gov.pl), from the Local Data Bank section, respectively:
•	category: Public finances, group: Revenue from the budgets of municipalities and cities with district rights, subgroup: Revenue per capita
•	category: Public finances, group: Expenditure of the budgets of municipalities and cities with district rights, subgroup: Expenditure per capita


Research Question {data-navmenu="Home"}
=====================================

•	what was the income of municipal budgets in 2016 and the amount of their expenses on education?
•	did the distribution of expenses vary depending on the province or type of municipality under consideration?
•	did average expenses differ significantly between urban, urban-rural and rural municipalities?
•	was there a significant relationship between income and spending?


About Me {data-navmenu="Home"}
=====================================

Ireneusz Korpusik

I'm a part-time MSc student in IT Carlow. The program code is: CW_SRDAT_M Y5 and a teacher name is: Agnes Maciocha.
Project was done in RStudio program with Flexdashboard package. 



Page 1
==================================================

Column {data-width=650}
-----------------------------------------------------------------------

### Frequency table

```{r}
# auxiliary data frame for the variable Voivodeship
x <- dane$voivodeship
tab1 <- as.data.frame(addmargins(table(x), FUN = sum)) # counts n
tab1$pr <- round(as.numeric(addmargins(100*prop.table(table(x)), FUN = sum)), 2) # percentage %
tab1$ods <- as.numeric(addmargins(prop.table(table(x)), FUN = sum)) # interest rate

# frequency table to be displayed (without interest rate)
kable(tab1[,1:3], col.names = c("Voivodeship", "n", "%"))
```
### HighCharter Treemap of Mazowieckie and Malopolskie

```{r}
dane%>%
  group_by(voivodeship)%>%
  summarise(avg_income= round(mean(income, na.rm = TRUE),0)) %>%
  hchart(type = "treemap",
         hcaes(x=voivodeship, value = avg_income, color = avg_income))
```







Column {data-width=400}
-----------------------------------------------------------------------

### Qualitative characteristic 

```{r fig.width = 5, fig.height = 5, fig.align = "center"}
# data frame for the chart (without the sum row)
tab2 <- tab1[-nrow(tab1),]
# pie chart
ggplot(tab2, aes(x = "", y = ods, fill = x)) + 
  geom_col(width = 1, colour = "black") + 
  geom_text(aes(y = ods,  label = paste0(pr,"%")), size = 5, fontface = "bold", 
            position = position_stack(vjust = 0.5))  +	  
  scale_y_continuous(labels = scales::percent_format()) + 
  coord_polar(theta = "y") + 
  labs(fill = "Voivodeship: ") + 
  theme(legend.position = "bottom", legend.text = element_text(colour = "black", size = 12), 
        axis.title.x = element_blank(), axis.title.y = element_blank(), 
        axis.text.y = element_blank(), axis.ticks = element_blank())

```

### Type of municipality

```{r}
# auxiliary data frame for the Type of municipality variable
x <- dane$municipality
tab3 <- as.data.frame(addmargins(table(x), FUN = sum)) # counts n
tab3$pr <- round(as.numeric(addmargins(100*prop.table(table(x)), FUN = sum)), 2) # percentage %
tab3$ods <- as.numeric(addmargins(prop.table(table(x)), FUN = sum)) # interest rate

# frequency table to be displayed (without interest rate)
kable(tab3[,1:3], col.names = c("Type of municipality", "n", "%"))
```
Page 2
==================================================

Column {data-width=550}
-----------------------------------------------------------------------
### Bar chart of distinction

```{r fig.width = 5, fig.height = 5, fig.align = "center"}
# data frame for the chart (without the sum row)
tab4 <- tab3[-nrow(tab3),]



# bar chart
ggplot(tab4, aes(x = x, y = pr)) + 
  geom_col(fill = "#FFFF99" , colour = "black") +
  geom_text(aes(label = paste0(pr,"%")), stat = "identity", size = 5, 
            fontface = "bold", position = position_stack(vjust = 0.5)) +
  scale_x_discrete(labels = wrap_format(25)) +
  ylim(NA, 1.2*max(tab4$pr)) +
  theme(axis.title.x = element_blank(), axis.text.x = element_text(colour = "black", size = 12)) +
  labs(y = "percentage")
```
Column {data-width=450}
-----------------------------------------------------------------------

### Budget revenue
```{r}
kable(describe(dane$income, quant = c(0.25, 0.75))[c(3:5,8:12,14:15)], digits = 2)
```



Column 
-----------------------------------------------------------------------

### Chart C

```{r fig.width = 5, fig.height = 5, fig.align = "center"}
ggplot(dane, aes(x = income)) + 
  geom_histogram(breaks=seq(2000, 5500, by = 500), colour = "black", fill = "#FFFF99") +
  labs(x = "Income in PLN", y = "Number of observations") +
  scale_x_continuous(breaks = seq(2000, 5500, by = 500))
```






Page 3
==================================================
Column {data-width=650}
-----------------------------------------------------------------------

### Mean of income


```{r}
t.test(dane$income, alternative = "two.sided", mu = 3000, conf.level = 0.95)
```
Column {data-width=350}
-----------------------------------------------------------------------

### Budget expenses for education

```{r}
kable(describe(dane$expenses, quant = c(0.25, 0.75))[c(3:5,8:12,14:15)], digits = 2)
```

### Chart c

```{r fig.width = 5, fig.height = 5, fig.align = "center"}
ggplot(dane, aes(x = expenses.. )) + 
  geom_histogram(breaks=seq(600, 1700, by = 100), colour = "black", fill = "#FFFF99") +
  labs(x = "Expenses in PLN", y = "Number of observations") +
  scale_x_continuous(breaks = seq(600, 1700, by = 100))
```

Page 4
==================================================

Column {data-width=650}
-----------------------------------------------------------------------

### Chart A

```{r}
t.test(dane$expenses.., alternative = "two.sided", mu = 1000, conf.level = 0.95)
```

### Comparison of distributions

```{r}
tab5 <- dane %>%
  group_by(voivodeship) %>%
  summarise(mean = mean(expenses..), 
            sd = sd(expenses..), 
            median = median(expenses..), 
            min = min(expenses..), 
            max = max(expenses..),
            skew = skew(expenses..),
            kurtosis = kurtosi(expenses..),
            Q1 = quantile(expenses.., 0.25),
            Q3 = quantile(expenses.., 0.75))
kable(tab5, digits = 2)
```

Column {.tabset}
-----------------------------------------------------------------------


### Chart C

```{r fig.width = 7, fig.height = 7, fig.align = "center"}
ggplot(dane, aes(x = expenses..)) + 
  geom_histogram(breaks=seq(600, 1700, by = 100), colour = "black", fill = "#FFFF99") +
  facet_grid(voivodeship ~ .) +
  labs(x = "Expenses in PLN", y = "Number of observations") +
  scale_x_continuous(breaks = seq(600, 1700, by = 100))
```

Page 5
==================================================

Column {data-width=650}
-----------------------------------------------------------------------

### Divison by type of municipality

```{r}
tab6 <- dane %>%
  group_by(municipality) %>%
  summarise(mean = mean(expenses..), 
            sd = sd(expenses..), 
            median = median(expenses..), 
            min = min(expenses..), 
            max = max(expenses..),
            skew = skew(expenses..),
            kurtosis = kurtosi(expenses..),
            Q1 = quantile(expenses.., 0.25),
            Q3 = quantile(expenses.., 0.75)) %>% as.data.frame
kable(tab6, digits = 2)
```

Column {data-width=650}
-----------------------------------------------------------------------

### Chart B

```{r fig.width = 7, fig.height = 7, fig.align = "center"}
ggplot(dane, aes(x = expenses..)) + 
  geom_histogram(breaks=seq(600, 1700, by = 100), colour = "black", fill = "#FFFF99") +
  facet_grid(municipality ~ .) +
  labs(x = "Expenses in PLN", y = "Number of observations") +
  scale_x_continuous(breaks = seq(600, 1700, by = 100))

```


Page 6
==================================================

Column {data-width=650}
-----------------------------------------------------------------------

##### Analysis of variance

```{r}
leveneTest(dane$expenses.. ~ dane$municipality)
```

##### Chart B

```{r}
model <- aov(expenses.. ~ municipality, data = dane)
summary(model)
```

##### Chart C

```{r}
pairwise.t.test(dane$expenses.., dane$municipality, p.adj = "none")
```

Column {.tabset}
-----------------------------------------------------------------------


### Chart D

```{r fig.width = 5, fig.height = 5, fig.align = "center"}
ggplot(tab6, aes(x = municipality, y = mean)) + 
  geom_col(fill = "#FFFF99" , colour = "black") +
  geom_text(aes(label = round(mean, 2)), size = 5, fontface = "bold", 
            position = position_stack(vjust = 0.5)) +
  scale_x_discrete(labels = wrap_format(25)) +
  ylim(NA, 1.2*max(tab6$mean)) +
  theme(axis.title.x = element_blank(), axis.text.x = element_text(colour = "black", size = 12)) +
  labs(y = "average")
```


Page 7
==================================================

Column {data-width=650}
-----------------------------------------------------------------------

### Chart A

```{r fig.width = 5, fig.height = 5, fig.align = "center"}
ggplot(dane, aes(x = income, y = expenses.., color=expenses..)) + 
  geom_point() + 
  scale_y_continuous(expand = c(0.01, 0)) + 
  xlab("Income") + 
  ylab("Expenses")

```

Column {.tabset}
-----------------------------------------------------------------------

### Chart B

```{r fig.width = 5, fig.height = 5, fig.align = "center"}
# removing the observation from the outlier
dane3 <- dane[dane$income<5000,]

# scatter plot
ggplot(dane3, aes(x = income, y = expenses.., color=income)) + 
  geom_point() + 
  stat_smooth(method = "lm", se = FALSE) + 
  scale_y_continuous(expand = c(0.01, 0)) + 
  xlab("Income") + 
  ylab("Expenses")
  

  
```

### Chart C

```{r}
cor(dane3$income, dane3$expenses..)
```
```{r}
model2 <- lm(expenses.. ~ income, data = dane3) 
summary(model2)
```